Estimating graph distance and centrality on shared nothing architectures

نویسندگان

  • Atilla Soner Balkir
  • Huseyin Oktay
  • Ian T. Foster
چکیده

We present a parallel toolkit for pairwise distance computation in massive networks. Computing the exact shortest paths between a large number of vertices is a costly operation, and serial algorithms are not practical for billion-scale graphs. We first describe an efficient parallel method to solve the single source shortest path problem on commodity hardware with no shared memory. Using it as a building block, we introduce a new parallel algorithm to estimate the shortest paths between arbitrary pairs of vertices. Our method exploits data locality, produces highly accurate results, and allows batch computation of shortest paths with 7% average error in graphs that contain billions of edges. The proposed algorithm is up to two orders of magnitude faster than previously suggested algorithms and does not require large amounts of memory or expensive high-end servers. We further leverage this method to estimate the closeness and betweenness centrality metrics, which involve systems challenges dealing with indexing, joining, and comparing large datasets efficiently. In one experiment, we mined a real-world Web graph with 700 million nodes and 12 billion edges to identify the most central vertices and calculated more than 63 billion shortest paths in 6 h on a 20-node commodity cluster. Copyright © 2014 John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of a Two-Level Hierarchical Parallel Database System

Two typical architectures of parallel database systems are the shared-everything and shared-nothing architectures. Shared-everything architecture provides better performance than the shared-nothing architecture but it is not scalable to large system sizes. On the other hand, shared-nothing architecture provides good system scalability but is sensitive to data skew. Hierarchical architectures ha...

متن کامل

Comparative analysis of organizational processes by the use of the social network concepts

This study presents a comparative analysis of redesigned models of organizational processes by making use of social network concepts. After doing re-engineering of organizational processes which had been conducted in the headquarters of Mazandaran Province Education Department, different methods were used which included the alpha algorithm, alpha⁺, genetics and heuristics. Every one of these me...

متن کامل

Greed is Good: Optimistic Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures

In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be significant. In particular, for the bipartite-graph partial coloring (BGPC) and distance-2 graph coloring (D2GC) problems, which have various use-cases within the scientific comput...

متن کامل

An Analysis of Three Transaction Processing Architectures

In this paper, we investigate the issues involved in using multiprocessors for high performance transaction processing applications. We use a simulation model to compare the performance of three different architectures, namely, Shared Everything, Shared Nothing and Shared Disks. In Shared Everything, any processor can access any disk and all memory is shared. In Shared Nothing, neither disks no...

متن کامل

Extending LOGFLOW with Parallel Relational Database Operations

LOGFLOW is a parallel Prolog system. It is similar to recent parallel database systems concerning its dataflow execution model and its capability of running on shared–nothing architectures. The similarities between LOGFLOW and parallel database systems show that a new database system can be developed based on LOGFLOW in that both relational and deductive queries can be executed. In this paper w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2015